Implementation of VTLN for statistical speech synthesis
نویسندگان
چکیده
Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.
منابع مشابه
Framework Of Feature Based Adaptation For Statistical Speech Synthesis And Recognition
The advent of statistical parametric speech synthesis has paved new ways to a unified framework for hidden Markov model (HMM) based text to speech synthesis (TTS) and automatic speech recognition (ASR). The techniques and advancements made in the field of ASR can now be adopted in the domain of synthesis. Speaker adaptation is a well-advanced topic in the area of ASR, where the adaptation data ...
متن کاملTper Hcaeser Pidi Implementation of Vtln for Statistical Speech Synthesis
Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The E...
متن کاملVtln-based Rapid Cross-lingual Adaptation for Statistical Parametric Speech Synthesis
Cross-lingual speaker adaptation (CLSA) has emerged as a new challenge in statistical parametric speech synthesis, with specific application to speech-to-speech translation. Recent research has shown that reasonable speaker similarity can be achieved in CLSA using maximum likelihood linear transformation of model parameters, but this method also has weaknesses due to the inherent mismatch cause...
متن کاملCombining Vocal Tract Length Normalization with Linear Transformations in a Bayesian Framework
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLRbased adaptation techniques, being much closer in quality to that generated by the original average voice model. By contrast, with just a single parameter, VTLN c...
متن کاملNew method for rapid vocal tract length adaptation in HMMbased speech synthesis
We present a new method to rapidly adapt the models of a statistical synthesizer to the voice of a new speaker. We apply a relatively simple linear transform that consists of a vocal tract length normalization (VTLN) part and a long-term average cepstral correction part. Despite the logical limitations of this approach, we will show that it effectively reduces the gap between source and target ...
متن کامل